-
-
Notifications
You must be signed in to change notification settings - Fork 839
fix(run-engine): carryover batchId after PENDING_EXECUTING stalls #2563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Walkthrough
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (2)
1007-1027
: Missing batchId when requeuing after retryIn the retry path that nacks and requeues, batchId isn’t forwarded, so the new snapshot can lose batch context. Pass latestSnapshot.batchId here too.
Apply:
- const nackResult = await this.tryNackAndRequeue({ + const nackResult = await this.tryNackAndRequeue({ run, environment: run.runtimeEnvironment, orgId: run.runtimeEnvironment.organizationId, projectId: run.runtimeEnvironment.project.id, timestamp: retryAt.getTime(), error: { type: "INTERNAL_ERROR", code: "TASK_RUN_DEQUEUED_MAX_RETRIES", message: `We tried to dequeue the run the maximum number of times but it wouldn't start executing`, }, + batchId: latestSnapshot.batchId ?? undefined, tx: prisma, });
1133-1164
: ThreadbatchId
through all tryNackAndRequeue call sites
- internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts:1015 – add
batchId: run.batchId
- internal-packages/run-engine/src/engine/systems/dequeueSystem.ts:634 – add
batchId: run.batchId
- internal-packages/run-engine/src/engine/index.ts:1447 – add
batchId: run.batchId
🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (1)
1039-1056
: Consider propagating batchId on immediate (short‑delay) retry snapshotsFor consistency, include batchId when creating a new EXECUTING snapshot without requeue, so downstream logic relying on batch context remains intact.
Apply:
const newSnapshot = await this.executionSnapshotSystem.createExecutionSnapshot( prisma, { run, snapshot: { executionStatus: "EXECUTING", description: "Attempt failed with a short delay, starting a new attempt", }, previousSnapshotId: latestSnapshot.id, environmentId: latestSnapshot.environmentId, environmentType: latestSnapshot.environmentType, projectId: latestSnapshot.projectId, organizationId: latestSnapshot.organizationId, + batchId: latestSnapshot.batchId ?? undefined, workerId, runnerId, } );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
internal-packages/run-engine/src/engine/index.ts
(1 hunks)internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}
: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations
Files:
internal-packages/run-engine/src/engine/index.ts
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts (1)
1198-1213
: Good: snapshot includes batchId on requeuePassing batchId into createExecutionSnapshot for the QUEUED snapshot preserves batch context after nack/requeue.
internal-packages/run-engine/src/engine/index.ts (1)
1453-1464
: Good: pass batchId when requeuing stalled PENDING_EXECUTINGThis ensures the requeued QUEUED snapshot retains the batch association.
This fixes batch resumes after stalling. Existing runs that are stuck here and are already EXECUTING again will have to be replayed.